Non-intuitive conditional independence facts hold in models of network data

نویسندگان

  • Marc Maier
  • Katerina Marazopoulou
  • David Jensen
چکیده

Many social scientists and researchers across a wide range of fields focus on analyzing a single causal dependency or a conditional model of some outcome variable. However, to reason about interventions or conditional independence, it is useful to construct a joint model of a domain. Researchers in computer science, statistics, and philosophy have developed representations (e.g., Bayesian networks), theory, and learning algorithms for causal discovery of joint models from observational data. Bayesian networks are graphical models that encode joint probability distributions over a system of variables, and they can be interpreted causally under a few assumptions [7, 8]. The rules of d-separation—a set of graphical criteria—are the foundation for algorithmic derivation of the conditional independence facts implied by the structure of Bayesian networks [1]. This theory connects causal structure with conditional independence and can be leveraged to learn causal models from observational data. Accurate reasoning about conditional independence facts is the basis for constraintbased algorithms that learn the structure of Bayesian networks (e.g., PC [8]). Bayesian networks, as well as most analytic methods for causal analysis (e.g., linear regression), assume that data instances are independent and identically distributed (IID). However, many real-world systems involve multiple types of interacting entities with probabilistic dependencies among the variables on those entities. For example, citation data involve researchers collaborating on scholarly papers that cite prior work. Over the past 15 years, researchers in statistics and computer science have devised more expressive classes of directed graphical models, such as probabilistic relational models (PRMs), which remove the assumptions of IID data to model network, or relational, data [2]. Relational models more closely represent the real-world domains that many social scientists and other researchers investigate. To successfully learn causal models from observational, network data, we need a similar theory for deriving conditional independence from relational models. In this paper, we explain why d-separation does not correctly produce conditional independence facts when applied to the structure of relational models. We show that nonintuitive conditional independence facts hold in models of network data through relationally d-connecting paths that only manifest in ground graphs. We describe the abstract ground graph, a lifted representation recently introduced by Maier et al. [4], that enables a sound, complete, and computationally efficient method for deriving conditional independ-separating path elements (exists one on path) d-connecting path elements (exists all on path)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reasoning about Independence in Probabilistic Models of Relational Data

Bayesian networks leverage conditional independence to compactly encode joint probability distributions. Many learning algorithms exploit the constraints implied by observed conditional independencies to learn the structure of Bayesian networks. The rules of d -separation provide a theoretical and algorithmic framework for deriving conditional independence facts from model structure. However, t...

متن کامل

Directed cyclic graphs, conditional independence, and non-recursive linear structural equation models

Recursive linear structural equation models can be represented by directed acyclic graphs. When represented in this way, they satisfy the Markov Condition. Hence it is possible to use the graphical d-separation to determine what conditional independence relations are entailed by a given linear structural equation model. I prove in this paper that it is also possible to use the graphical d-separ...

متن کامل

Conditional Dependence in Longitudinal Data Analysis

Mixed models are widely used to analyze longitudinal data. In their conventional formulation as linear mixed models (LMMs) and generalized LMMs (GLMMs), a commonly indispensable assumption in settings involving longitudinal non-Gaussian data is that the longitudinal observations from subjects are conditionally independent, given subject-specific random effects. Although conventional Gaussian...

متن کامل

Robust portfolio selection with polyhedral ambiguous inputs

 Ambiguity in the inputs of the models is typical especially in portfolio selection problem where the true distribution of random variables is usually unknown. Here we use robust optimization approach to address the ambiguity in conditional-value-at-risk minimization model. We obtain explicit models of the robust conditional-value-at-risk minimization for polyhedral and correlated polyhedral am...

متن کامل

Markov Properties for Linear Causal Models with Correlated Errors Markov Properties for Linear Causal Models with Correlated Errors

A linear causal model with correlated errors, represented by a DAG with bi-directed edges, can be tested by the set of conditional independence relations implied by the model. A global Markov property specifies, by the d-separation criterion, the set of all conditional independence relations holding in any model associated with a graph. A local Markov property specifies a much smaller set of co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013